Estimating sample-domain distortion in the transform domain with rounding compensation

ABSTRACT

Techniques and tools are described for compensating for rounding when estimating sample-domain distortion in the transform domain. For example, a video encoder estimates pixel-domain distortion in the transform domain for a block of transform coefficients after compensating for rounding in the DC coefficient of the block. In this way, the video encoder improves the accuracy of pixel-domain distortion estimation but retains the computational advantages of performing the estimation in the transform domain. Rounding compensation includes, for example, looking up an index (from a de-quantized transform coefficient) in a rounding offset table to determine a rounding offset, then adjusting the coefficient by the offset. Other techniques and tools described herein are directed to creating rounding offset tables and encoders that make encoding decisions after considering rounding effects that occur after an inverse frequency transform on de-quantized transform coefficient values.

BACKGROUND

Digital video consumes large amounts of storage and transmissioncapacity. A typical raw digital video sequence includes 15 or 30 framesper second. Each frame can include tens or hundreds of thousands ofpixels (also called pels), where each pixel represents a tiny element ofthe picture. In raw form, a computer commonly represents a pixel as aset of three samples totaling 24 bits. Thus, the number of bits persecond, or bit rate, of a typical raw digital video sequence may be 5million bits per second or more.

Many computers and computer networks lack the resources to process rawdigital video. For this reason, engineers use compression (also calledcoding or encoding) to reduce the bit rate of digital video. Compressiondecreases the cost of storing and transmitting video by converting thevideo into a lower bit rate form. Decompression (also called decoding)reconstructs a version of the original video from the compressed form. A“codec” is an encoder/decoder system. Compression can be lossless, inwhich the quality of the video does not suffer, but decreases in bitrate are limited by the inherent amount of variability (sometimes calledentropy) of the video data. Or, compression can be lossy, in which thequality of the video suffers, but achievable decreases in bit rate aremore dramatic. Lossy compression is often used in conjunction withlossless compression—the lossy compression establishes an approximationof information, and the lossless compression is applied to represent theapproximation.

A basic goal of lossy compression is to provide good rate-distortionperformance. So, for a particular bit rate, an encoder attempts toprovide the highest quality of video. Or, for a particular level ofquality/fidelity to the original video, an encoder attempts to providethe lowest bit rate encoded video. In practice, considerations such asencoding time, encoding complexity, encoding resources, decoding time,decoding complexity, decoding resources, overall delay, and/orsmoothness in quality/bit rate changes also affect decisions made incodec design as well as decisions made during actual encoding.

In general, video compression techniques include “intra-picture”compression and “inter-picture” compression. Intra-picture compressiontechniques compress individual pictures, and inter-picture compressiontechniques compress pictures with reference to a preceding and/orfollowing picture (often called a reference or anchor picture) orpictures.

I. Intra Compression

FIG. 1 illustrates block-based intra compression in an example encoder.In particular, FIG. 1 illustrates intra compression of an 8×8 block(105) of samples by the encoder. The encoder splits a picture into 8×8blocks of samples and applies a forward 8×8 frequency transform (110)(such as a discrete cosine transform (“DCT”)) to individual blocks suchas the block (105). The encoder quantizes (120) the transformcoefficients (115), resulting in an 8×8 block of quantized transformcoefficients (125).

With quantization, the encoder essentially trades off quality and bitrate. More specifically, quantization can affect the fidelity with whichthe transform coefficients are encoded, which in turn can affect bitrate. Coarser quantization tends to decrease fidelity to the originaltransform coefficients as the coefficients are more coarselyapproximated. Bit rate also decreases, however, when decreasedcomplexity can be exploited with lossless compression. Conversely, finerquantization tends to preserve fidelity and quality but result in higherbit rates.

Different encoders use different parameters for quantization. In mostencoders, a level or step size of quantization is set for a block,picture, or other unit of video. In some encoders, the encoder can alsoadjust the “dead zone,” which is the range of values around zero thatare approximated as zero. Some encoders quantize coefficientsdifferently within a given block, so as to apply relatively coarserquantization to perceptually less important coefficients, and aquantization matrix can be used to indicate the relative weights. Or,apart from the rules used to reconstruct quantized values, some encodersvary the thresholds according to which values are quantized so as toquantize certain values more aggressively than others.

Returning to FIG. 1, further encoding varies depending on whether acoefficient is a DC coefficient (the lowest frequency coefficient shownas the top left coefficient in the block (125)), an AC coefficient inthe top row or left column in the block (125), or another ACcoefficient. The encoder typically encodes the DC coefficient (126) as adifferential from the reconstructed DC coefficient (136) of aneighboring 8×8 block. The encoder entropy encodes (140) thedifferential. The entropy encoder can encode the left column or top rowof AC coefficients as differentials from AC coefficients a correspondingleft column or top row of a neighboring 8×8 block. The encoder scans(150) the 8×8 block (145) of predicted, quantized AC coefficients into aone-dimensional array (155). The encoder then entropy encodes thescanned coefficients using a variation of run/level coding (160).

In corresponding decoding, a decoder produces a reconstructed version ofthe original 8×8 block. The decoder entropy decodes the quantizedtransform coefficients, scanning the quantized coefficients into atwo-dimensional block, and performing AC prediction and/or DC predictionas needed. The decoder inverse quantizes the quantized transformcoefficients of the block and applies an inverse frequency transform(such as an inverse DCT (“IDCT”)) to the de-quantized transformcoefficients, producing the reconstructed version of the original 8×8block. When a picture is used as a reference picture in subsequentmotion compensation (see below), an encoder also reconstructs thepicture.

II. Inter Compression

Inter-picture compression techniques often use motion estimation andmotion compensation to reduce bit rate by exploiting temporal redundancyin a video sequence. Motion estimation is a process for estimatingmotion between pictures. In one common technique, an encoder usingmotion estimation attempts to match a block of samples in a currentpicture with a block of samples in a search area in another picture,called the reference picture. When the encoder finds an exact or “closeenough” match in the search area in the reference picture, the encoderparameterizes the change in position of the blocks as motion data (suchas a motion vector). In general, motion compensation is a process ofreconstructing pictures from reference picture(s) using motion data.

FIG. 2 illustrates motion estimation for part of a predicted picture inan example encoder. For an 8×8 block of samples, 16×16 block (oftencalled a “macroblock”), or other unit of the current picture, theencoder finds a similar unit in a reference picture for use as apredictor. In FIG. 2, the encoder computes a motion vector for a 16×16macroblock (215) in the current, predicted picture (210). The encodersearches in a search area (235) of a reference picture (230). Within thesearch area (235), the encoder compares the macroblock (215) from thepredicted picture (210) to various candidate macroblocks in order tofind a candidate macroblock that is a good match. The encoder outputsinformation specifying the motion vector to the predictor macroblock.

The encoder computes the sample-by-sample difference between the currentunit and the predictor to determine a residual (also called errorsignal). The residual is frequency transformed, quantized, and entropyencoded. The overall bit rate of a predicted picture depends in largepart on the bit rate of residuals. The bit rate of residuals is low ifthe residuals are simple (i.e., due to motion estimation that findsexact or good matches) or lossy compression drastically reduces thecomplexity of the residuals. Bits saved with successful motionestimation can be used to improve quality elsewhere or reduce overallbit rate. On the other hand, the bit rate of complex residuals can behigher, depending on the degree of lossy compression applied to reducethe complexity of the residuals.

Encoders typically spend a large proportion of encoding time performingmotion estimation, attempting to find good matches and thereby improverate-distortion performance. In most scenarios, however, an encoderlacks the time or resources to check every possible motion vector forevery block or macroblock to be encoded. The encoder therefore usesmotion vector search patterns and matching heuristics deemed likely tofind a good match in an acceptable amount of time.

The number of motion vectors used to represent a picture can also affectrate-distortion performance. Using four motion vectors for fourdifferent 8×8 blocks of a 16×16 macroblock (instead of one motion vectorfor the macroblock) allows an encoder to capture different motion forthe different blocks, potentially resulting in better matches. On theother hand, motion vector information for four motion vectors (insteadof one) is signaled, increasing bit rate of motion data.

FIG. 3 illustrates compression of a prediction residual for amotion-compensated block of a predicted picture in an example encoder.The encoder computes an 8×8 prediction error block (335) as thedifference between a predicted block (315) and a current 8×8 block(325).

The encoder applies a frequency transform (340) to the residual (335),producing a block of transform coefficients (345). Some encoders switchbetween different sizes of transforms, e.g., an 8×8 transform, two 4×8transforms, two 8×4 transforms, or four 4×4 transforms for an 8×8prediction residual block. Smaller transform sizes allow for greaterisolation of transform coefficients having non-zero values, butgenerally require more signaling overhead. FIG. 3 shows the encoderusing one 8×8 transform.

The encoder quantizes (350) the transform coefficients (345) and scans(360) the quantized coefficients (355) into a one-dimensional array(365) such that coefficients are generally ordered from lowest frequencyto highest frequency. The encoder entropy codes the data in the array(365).

If a predicted picture is used as a reference picture for subsequentmotion compensation, the encoder reconstructs the predicted picture.When reconstructing residuals, the encoder reconstructs transformcoefficients that were quantized and performs an inverse frequencytransform. The encoder performs motion compensation to compute themotion-compensated predictors, and combines the predictors with theresiduals. During decoding, a decoder typically entropy decodesinformation and performs analogous operations to reconstruct residuals,perform motion compensation, and combine the predictors with thereconstructed residuals.

III. Computing Pixel-Domain Distortion When Making Encoding Decisions

The previous two sections mention some of the decisions that an encodercan make during encoding. When encoding a block of a predicted picture,an encoder can evaluate and set a number of coding parameters,including: (1) whether the block should be encoded as an intra or inter;(2) the number of motion vectors; (3) the value(s) of motion vector(s);(4) the type of frequency transform; (5) the size of frequency transform(e.g., 8×8, 4×8, 8×4, or 4×4); (6) the quantization step size; (7) thequantization thresholds to apply; (8) the dead zone size; and (9) thequantization matrix. Or, for a block of an intra-coded picture, theencoder can evaluate and set various quantization-related parameters.Depending on implementation, an encoder may finalize certain parameterdecisions before starting to evaluate other parameters. Or, the encodermay jointly explore different combinations of coding parameters, whichmakes the decision-making process even more complex given the number ofpermutations to evaluate.

In making encoding decisions, an encoder often evaluates the distortionand rate associated with the different choices. In particular, for ablock to be encoded, pixel-domain distortion of the block encodedaccording to different coding choices is an important criterion inencoder mode decisions. There are several approaches to determiningpixel-domain distortion.

In one approach, an encoder performs inverse quantization to reconstructtransform coefficients for a block and performs an inverse frequencytransform on the de-quantized transform coefficients. The encoderdirectly measures pixel-domain distortion by comparing the reconstructedpixel-domain values for the block to the original pixel-domain valuesfor the block. While this approach yields accurate pixel-domaindistortion measurements, it is expensive in terms of encoding time andresources. Performing an inverse frequency transform for every evaluatedcoding choice greatly increases the computational complexity of theencoding task. As a result, encoding time increases or more encodingresources are required. Or, to handle practical time or resourceconstraints, an encoder evaluates fewer coding options, which can resultin the encoder missing efficient options.

In another approach, an encoder performs inverse quantization toreconstruct transform coefficients for a block but measures distortionin the transform domain. The encoder measures transform-domaindistortion by comparing the de-quantized transform coefficients for theblock to the original transform coefficients for the block. To estimatepixel-domain distortion for the block, the encoder can multiply thetransform-domain distortion by a scale factor that depends on thefrequency transform used. If the transform is orthogonal, the encodermultiplies the transform-domain distortion by a non-zero scale factor sothat the energy in the transform domain is roughly equivalent to theenergy in the pixel domain. In this approach, the encoder does notperform an inverse frequency transform for every evaluated codingchoice, so computational complexity is lowered. The pixel-domaindistortion estimated by this approach is often inaccurate, however,particularly when only the DC coefficient of a block has a significantvalue. This inaccuracy in pixel-domain distortion estimation can lead toinefficient choices of coding parameters and poor rate-distortionperformance.

Given the critical importance of video compression to digital video, itis not surprising that video compression is a richly developed field.Whatever the benefits of previous video compression techniques, however,they do not have the advantages of the following techniques and tools.

SUMMARY

The present application is directed to techniques and tools forcompensating for rounding when estimating sample-domain distortion inthe transform domain. For example, a video encoder estimatespixel-domain distortion in the transform domain for a block of transformcoefficients after compensating for rounding effects in the DCcoefficient of the block. In this way, the video encoder improves theaccuracy of the pixel-domain distortion estimation but retains thecomputational advantages of performing the estimation in the transformdomain.

According to a first aspect of the described techniques and tools, atool such as a video encoder compensates for rounding in a coefficientof a set of transform coefficients. The tool estimates sample-domaindistortion using the rounding-compensated coefficient and othertransform coefficients of the set. The tool then makes a decision basedon the estimated distortion and outputs results.

For example, a video encoder compensates for rounding by looking up atable index (determined from a DC coefficient of a block of de-quantizedtransform coefficients) in a rounding offset table to determine arounding offset, then adjusting the DC coefficient by the roundingoffset. When estimating distortion, the encoder computes the differencebetween the original DC coefficient and the rounding-compensated DCcoefficient. The encoder eventually selects between intra and interencoding for the block based upon distortion estimates.

According to a second aspect of the described techniques and tools, anencoder includes a frequency transformer, a quantizer, an entropyencoder, an inverse quantizer and a controller. The controller makesencoding decisions after considering post-inverse frequency transformrounding effects on de-quantized transform coefficient values.

According to a third aspect of the described techniques and tools, arange of values for a de-quantized transform coefficient is identified.A rounding offset is computed for each of multiple values in the range.A periodic pattern in the offsets is identified, and representativevalues are mapped to corresponding rounding offsets in an offset table.The corresponding rounding offsets show at least one period of thepattern without the table including all values in the range. In thisway, table size is reduced. The table is stored in computer storage orelsewhere. For example, the offset table is created off-line anddistributed with a video encoder for use during video encoding.

This summary introduces a selection of concepts in a simplified form.The concepts are further described below in the detailed description.This summary is not intended to identify key features or essentialfeatures of the claimed subject matter, nor is it intended to be used tolimit the scope of the claimed subject matter.

The foregoing and other objects, features, and advantages will becomemore apparent from the following detailed description, which proceedswith reference to the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing encoding of a block with intra-picturecompression according to the prior art.

FIG. 2 is a diagram showing motion estimation according to the priorart.

FIG. 3 is a diagram showing encoding of a block with inter-picturecompression according to the prior art.

FIG. 4 is a block diagram of a suitable computing environment in whichseveral described embodiments may be implemented.

FIG. 5 is a block diagram of a video encoder system in conjunction withwhich several described embodiments may be implemented.

FIG. 6 is a block diagram of a generalized tool for estimatingsample-domain distortion in the transform domain using roundingcompensation

FIG. 7 is a flowchart of a generalized technique for estimatingsample-domain distortion from transform coefficients with roundingcompensation.

FIG. 8 is a flowchart of a technique for estimating pixel-domaindistortion from transform coefficients with rounding compensation duringencoding.

FIG. 9 is a flowchart of a technique for creating a rounding offsettable.

DETAILED DESCRIPTION

The present application relates to techniques and tools for estimatingsample-domain distortion in the transform domain with roundingcompensation. In various described embodiments, a video encoderincorporates techniques for estimating pixel-domain distortion in thetransform domain with rounding compensation.

Various alternatives to the implementations described herein arepossible. For example, certain techniques described with reference toflowchart diagrams can be altered by changing the ordering of stagesshown in the flowcharts, by repeating or omitting certain stages, etc.The various techniques and tools described herein can be used incombination or independently. Different embodiments implement one ormore of the described techniques and tools. Aside from uses in videoencoding, sample-domain distortion estimation in the transform domainwith rounding compensation can be used in image encoding, videotranscoding, image classification, or other areas.

Some of the techniques and tools described herein address one or more ofthe problems noted in the Background. Typically, a given technique/tooldoes not solve all such problems. Rather, in view of constraints andtradeoffs in encoding time, resources, and/or quality, the giventechnique/tool improves encoding performance for a particularimplementation or scenario.

I. Computing Environment

FIG. 4 illustrates a generalized example of a suitable computingenvironment (400) in which several of the described embodiments may beimplemented. The computing environment (400) is not intended to suggestany limitation as to scope of use or functionality, as the techniquesand tools may be implemented in diverse general-purpose orspecial-purpose computing environments.

With reference to FIG. 4, the computing environment (400) includes atleast one processing unit (410) and memory (420). In FIG. 4, this mostbasic configuration (430) is included within a dashed line. Theprocessing unit (410) executes computer-executable instructions and maybe a real or a virtual processor. In a multi-processing system, multipleprocessing units execute computer-executable instructions to increaseprocessing power. The memory (420) may be volatile memory (e.g.,registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flashmemory, etc.), or some combination of the two. The memory (420) storessoftware (480) implementing an encoder with one or more of the describedtechniques and tools for sample-domain distortion estimation in thetransform domain with rounding compensation.

A computing environment may have additional features. For example, thecomputing environment (400) includes storage (440), one or more inputdevices (450), one or more output devices (460), and one or morecommunication connections (470). An interconnection mechanism (notshown) such as a bus, controller, or network interconnects thecomponents of the computing environment (400). Typically, operatingsystem software (not shown) provides an operating environment for othersoftware executing in the computing environment (400), and coordinatesactivities of the components of the computing environment (400).

The storage (440) may be removable or non-removable, and includesmagnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any othermedium which can be used to store information and which can be accessedwithin the computing environment (400). The storage (440) storesinstructions for the software (480) implementing the video encoder.

The input device(s) (450) may be a touch input device such as akeyboard, mouse, pen, or trackball, a voice input device, a scanningdevice, or another device that provides input to the computingenvironment (400). For audio or video encoding, the input device(s)(450) may be a sound card, video card, TV tuner card, or similar devicethat accepts audio or video input in analog or digital form, or a CD-ROMor CD-RW that reads audio or video samples into the computingenvironment (400). The output device(s) (460) may be a display, printer,speaker, CD-writer, or another device that provides output from thecomputing environment (400).

The communication connection(s) (470) enable communication over acommunication medium to another computing entity. The communicationmedium conveys information such as computer-executable instructions,audio or video input or output, or other data in a modulated datasignal. A modulated data signal is a signal that has one or more of itscharacteristics set or changed in such a manner as to encode informationin the signal. By way of example, and not limitation, communicationmedia include wired or wireless techniques implemented with anelectrical, optical, RF, infrared, acoustic, or other carrier.

The techniques and tools can be described in the general context ofcomputer-readable media. Computer-readable media are any available mediathat can be accessed within a computing environment. By way of example,and not limitation, with the computing environment (400),computer-readable media include memory (420), storage (440),communication media, and combinations of any of the above.

The techniques and tools can be described in the general context ofcomputer-executable instructions, such as those included in programmodules, being executed in a computing environment on a target real orvirtual processor. Generally, program modules include routines,programs, libraries, objects, classes, components, data structures, etc.that perform particular tasks or implement particular abstract datatypes. The functionality of the program modules may be combined or splitbetween program modules as desired in various embodiments.Computer-executable instructions for program modules may be executedwithin a local or distributed computing environment.

For the sake of presentation, the detailed description uses terms like“decide” and “analyze” to describe computer operations in a computingenvironment. These terms are high-level abstractions for operationsperformed by a computer, and should not be confused with acts performedby a human being. The actual computer operations corresponding to theseterms vary depending on implementation.

II. Generalized Video Encoder

FIG. 5 is a block diagram of a generalized video encoder (500) inconjunction with which some described embodiments may be implemented.The encoder (500) receives a sequence of video pictures including acurrent picture (505) and produces compressed video information (595) asoutput to storage, a buffer, or a communications connection. The formatof the output bitstream can be a Windows Media Video or VC-1 format,MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H.26x format (e.g.,H.261, H.262, H.263, or H.264), or other format.

The encoder (500) processes video pictures. The term picture generallyrefers to source, coded or reconstructed image data. For progressivevideo, a picture is a progressive video frame. For interlaced video, apicture may refer to an interlaced video frame, the top field of theframe, or the bottom field of the frame, depending on the context. Theencoder (500) is block-based and use a 4:2:0 macroblock format forframes, with each macroblock including four 8×8 luminance blocks (attimes treated as one 16×16 macroblock) and two 8×8 chrominance blocks.For fields, the same or a different macroblock organization and formatmay be used. The 8×8 blocks may be further sub-divided at differentstages, e.g., at the frequency transform and entropy encoding stages.The encoder (500) can perform operations on sets of samples of differentsize or configuration than 8×8 blocks and 16×16 macroblocks.Alternatively, the encoder (500) is object-based or uses a differentmacroblock or block format.

Returning to FIG. 5, the encoder system (500) compresses predictedpictures and intra-coded, key pictures. For the sake of presentation,FIG. 5 shows a path for key pictures through the encoder system (500)and a path for predicted pictures. Many of the components of the encodersystem (500) are used for compressing both key pictures and predictedpictures. The exact operations performed by those components can varydepending on the type of information being compressed.

A predicted picture (e.g., progressive P-frame or B-frame, interlacedP-field or B-field, or interlaced P-frame or B-frame) is represented interms of prediction from one or more other pictures (which are typicallyreferred to as reference pictures or anchors). A prediction residual isthe difference between predicted information and corresponding originalinformation. In contrast, a key picture (e.g., progressive I-frame,interlaced I-field, or interlaced I-frame) is compressed withoutreference to other pictures.

If the current picture (505) is a predicted picture, a motion estimator(510) estimates motion of macroblocks or other sets of samples of thecurrent picture (505) with respect to one or more reference pictures.The picture store (520) buffers a reconstructed previous picture (525)for use as a reference picture. When multiple reference pictures areused, the multiple reference pictures can be from different temporaldirections or the same temporal direction. The encoder system (500) canuse the separate stores (520) and (522) for multiple reference pictures.

The motion estimator (510) can estimate motion by full-sample, ½-sample,¼-sample, or other increments, and can switch the precision of themotion estimation on a picture-by-picture basis or other basis. Themotion estimator (510) (and compensator (530)) also can switch betweentypes of reference picture sample interpolation (e.g., between bicubicand bilinear) on a per-picture or other basis. The precision of themotion estimation can be the same or different horizontally andvertically. The motion estimator (510) outputs as side informationmotion information (515) such as differential motion vector information.The encoder (500) encodes the motion information (515) by, for example,computing one or more motion vector predictors for motion vectors,computing differentials between the motion vectors and motion vectorpredictors, and entropy coding the differentials. To reconstruct amotion vector, a motion compensator (530) combines a motion vectorpredictor with differential motion vector information.

The motion compensator (530) applies the reconstructed motion vectors tothe reconstructed (reference) picture(s) (525) when forming amotion-compensated current picture (535). The difference (if any)between a block of the motion-compensated current picture (535) andcorresponding block of the original current picture (505) is theprediction residual (545) for the block. During later reconstruction ofthe current picture, reconstructed prediction residuals are added to themotion compensated current picture (535) to obtain a reconstructedpicture that is closer to the original current picture (505). In lossycompression, however, some information is still lost from the originalcurrent picture (505). Alternatively, a motion estimator and motioncompensator apply another type of motion estimation/compensation.

A frequency transformer (560) converts spatial domain video informationinto frequency domain (i.e., spectral, transform) data. For block-basedvideo pictures, the frequency transformer (560) applies a DCT, variantof DCT, or other forward block transform to blocks of the samples orprediction residual data, producing blocks of frequency transformcoefficients. Alternatively, the frequency transformer (560) appliesanother conventional frequency transform such as a Fourier transform oruses wavelet or sub-band analysis. The frequency transformer (560) mayapply an 8×8, 8×4, 4×8, 4×4 or other size frequency transform.

A quantizer (570) then quantizes the blocks of transform coefficients.The quantizer (570) applies uniform, scalar quantization to the spectraldata with a step-size that varies on a picture-by-picture basis or otherbasis. The quantizer (570) can also apply another type of quantizationto the spectral data coefficients, for example, a non-uniform, vector,or non-adaptive quantization. In addition to adaptive quantization, theencoder (500) can use frame dropping, adaptive filtering, or othertechniques for rate control.

When a reconstructed current picture is needed for subsequent motionestimation/compensation, an inverse quantizer (576) performs inversequantization on the quantized spectral data coefficients. An inversefrequency transformer (566) performs an inverse frequency transform,producing blocks of reconstructed prediction residuals (for a predictedpicture) or samples (for a key picture). If the current picture (505)was a key picture, the reconstructed key picture is taken as thereconstructed current picture (not shown). If the current picture (505)was a predicted picture, the reconstructed prediction residuals areadded to the motion-compensated predictors (535) to form thereconstructed current picture. One or both of the picture stores (520,522) buffers the reconstructed current picture for use in subsequentmotion-compensated prediction. In some embodiments, the encoder appliesa de-blocking filter to the reconstructed frame to adaptively smoothdiscontinuities and other artifacts in the picture.

The entropy coder (580) compresses the output of the quantizer (570) aswell as certain side information (e.g., motion information (515),quantization step size). Typical entropy coding techniques includearithmetic coding, differential coding, Huffman coding, run lengthcoding, LZ coding, dictionary coding, and combinations of the above. Theentropy coder (580) typically uses different coding techniques fordifferent kinds of information, and can choose from among multiple codetables within a particular coding technique.

The entropy coder (580) provides compressed video information (595) tothe multiplexer (“MUX”) (590). The MUX (590) may include a buffer, and abuffer level indicator may be fed back to a controller. Before or afterthe MUX (590), the compressed video information (595) can be channelcoded for transmission over the network. The channel coding can applyerror detection and correction data to the compressed video information(595).

A controller (not shown) receives inputs from various modules such asthe motion estimator (510), frequency transformer (560), quantizer(570), inverse quantizer (576), entropy coder (580), and buffer (590).The controller evaluates intermediate results during encoding, forexample, estimating distortion and performing other rate-distortionanalysis. The controller works with modules such as the motion estimator(510), frequency transformer (560), quantizer (570), and entropy coder(580) to set and change coding parameters during encoding. When anencoder evaluates different coding parameter choices during encoding,the encoder may iteratively perform certain stages (e.g., quantizationand inverse quantization) to evaluate different parameter settings. Theencoder may set parameters at one stage before proceeding to the nextstage. Or, the encoder may jointly evaluate different coding parameters,for example, jointly making an intra/inter block decision and selectingmotion vector values, if any, for a block. The tree of coding parameterdecisions to be evaluated, and the timing of corresponding encoding,depends on implementation.

The relationships shown between modules within the encoder (500)indicate general flows of information in the encoder; otherrelationships are not shown for the sake of simplicity. In particular,FIG. 5 usually does not show side information indicating the encodersettings, modes, tables, etc. used for a video sequence, picture,macroblock, block, etc. Such side information, once finalized, is sentin the output bitstream, typically after entropy encoding of the sideinformation.

Particular embodiments of video encoders typically use a variation orsupplemented version of the generalized encoder (500). Depending onimplementation and the type of compression desired, modules of theencoder can be added, omitted, split into multiple modules, combinedwith other modules, and/or replaced with like modules. For example, thecontroller can be split into multiple controller modules associated withdifferent modules of the encoder. In alternative embodiments, encoderswith different modules and/or other configurations of modules performone or more of the described techniques.

III. Estimating Sample-Domain Distortion with Rounding Compensation

Techniques and tools described herein provide ways to estimatesample-domain distortion accurately in the transform domain. Inparticular, an encoder or other tool estimates sample-domain distortionusing transform coefficients, after compensating for at least some ofthe rounding that would occur following an inverse frequency transform.

A. Theory and Explanation.

When selecting certain coding parameters, an encoder evaluates thedistortion and/or rate associated with different coding parameterchoices in order to improve rate-distortion performance. In particular,pixel-domain distortion is an important factor in encoding decisions inmany systems. The pixel-domain distortion for a block is based ondifferences between original sample values for the block andreconstructed sample values for the block. As such, the pixel-domaindistortion reflects fidelity changes from throughout encoding (e.g.,from quantization) and reconstruction (e.g., from rounding after aninverse frequency transform).

One problem with computing distortion in the sample domain is that fullyreconstructing sample values requires an inverse frequency transform.This adds a small computational cost every time a coding parameter orcombination is evaluated with pixel-domain distortion measurement. Evenif the cost of performing a single inverse frequency transform is small,in the aggregate, the computational cost becomes significant.

One way to avoid the cost of performing an inverse frequency transformper distortion measurement is to estimate pixel-domain distortion in thetransform domain, with reference to transform coefficients. By speedingup estimation, a particular set of coding parameters can be evaluatedmore quickly, or more coding parameters can be evaluated within aparticular duration of time. Such transform-domain estimations areinaccurate in many cases, however, in that they do not correlate wellwith corresponding pixel-domain distortion measurements, even whenscaling factors compensate for differences in pixel-domain energy andtransform-domain energy.

For example, suppose an encoder frequency transforms an 8×8 block ofuniform sample values (all “1”) into the following 8×8 block oftransform coefficients.

$\begin{matrix}{\begin{bmatrix}7 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\end{bmatrix}.} & (a)\end{matrix}$

Following quantization and inverse quantization, suppose the 8×8 blockof transform coefficients has the following de-quantized values.

$\begin{matrix}{\begin{bmatrix}5 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\end{bmatrix}.} & (b)\end{matrix}$

If the encoder estimates pixel-domain distortion using the transformcoefficients of blocks (a) and (b), the difference in DC coefficientvalues indicates distortion has been introduced. This might notcorrelate with pixel-domain distortion, however. Suppose that when theencoder applies an inverse frequency transform to block (b), due torounding effects after the inverse frequency transform, an 8×8 block ofuniform sample values (all “1”) is reconstructed. In that case,effectively no pixel-domain distortion has been introduced, contrary tothe estimate of pixel-domain distortion made using only the transformcoefficients.

Depending on the distortion metric used, block (b) might be evenconsidered less favorable than another block such as:

$\begin{matrix}{\begin{bmatrix}7 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\end{bmatrix}.} & (c)\end{matrix}$

If the encoder applies an inverse frequency transform to block (c),however, the 8×8 block may have non-uniform sample values due to theeffect of the non-zero AC coefficient. This shows another mismatchbetween estimated and actual pixel-domain distortion values.

In many implementations, after an encoder performs an integer forwardfrequency transform, the results of the transform are not normalizedcompletely. In other words, the intermediate representation ofinformation has a higher resolution than the original representation.This allows the encoder to retain precision in intermediate results.Although rounding may occur in the intermediate results after theforward integer transform, the rounding often is insignificant inabsolute terms and relative to subsequent quantization. After theencoder or a decoder performs the integer inverse frequency transform,however, rounding effects are typically much more significant. After theinverse transform, results are returned to the resolution of theoriginal representation, which can include normalization to account forretained precision/expansion from the forward transform as well asexpansion from the inverse transform.

More formally, suppose {circumflex over (X)} represents de-quantizedtransform coefficients for a block, and {circumflex over (x)} representsthe inverse frequency transformed de-quantized transform coefficients,T⁻¹({circumflex over (X)}). The de-quantized transform coefficients arenot necessarily the same as the transform of {circumflex over (x)}. Forexample, {circumflex over (X)}_(Block(b)) does not equalT(T⁻¹({circumflex over (X)}_(Block(b)))). Due to rounding effects,distortion calculated in the transform domain using de-quantizedtransform coefficient values {circumflex over (X)} and originaltransform coefficient values X is biased and may lead the encoder tomake a wrong decision.

For many inverse transforms, the DC coefficient has consistent andpronounced rounding effects. In contrast, rounding effects due to ACcoefficients are less predictable and less pronounced. A typicalfrequency transform matrix has a top row of matrix coefficients with allpositive values. Subsequent rows have positive and negative values. Formany patterns of information, positive and negative effects cancel eachother out for an AC coefficient, but the positive values of the top rowconsistently affect rounding due the DC coefficient. In addition, the DCcoefficient affects all samples in a block in the same way, whereas anAC coefficient affects different samples differently. AC coefficientsalso tend to have smaller magnitudes than DC coefficients. Finally, tothe extent AC coefficients do have rounding effects, the roundingeffects for different AC coefficients often cancel each other out.

Therefore, in some embodiments, one or more rounding offsetsparameterize the difference(s) between T ({circumflex over (x)}) and{circumflex over (X)}, thereby accounting for rounding effects thatfollow an inverse frequency transform. In some implementations, roundingcompensation accounts for rounding effects from DC coefficients but notAC coefficients. For a block of de-quantized transform coefficients{circumflex over (X)}, an encoder compensates for rounding effects inthe DC coefficient {circumflex over (X)}(0,0) of the block beforeestimating sample-domain distortion in the transform domain. The encoderdoes not adjust the de-quantized AC coefficients of the block, as theoverall effect of AC rounding on distortion calculation is typicallynegligible. Alternatively, rounding compensation accounts for roundingeffects in at least some AC coefficients. For example, roundingcompensation accounts for rounding effects for the top row or leftcolumn of AC coefficients for a block of transform coefficients.

The encoder determines a rounding offset for a particular DC coefficient{circumflex over (X)}(0,0) by looking up an index for the DC coefficientin a table that maps indices to rounding offsets. The table isimplemented as an array or other data structure. In a simple case, theindex is the DC coefficient itself, and the table includes a roundingoffset for each possible de-quantized DC coefficient value. The roundingoffset is then added to the DC coefficient to produce arounding-compensated DC coefficient.

{circumflex over (X)} ^(rc)(0,0)={circumflex over(X)}(0,0)+DCOFFSET[{circumflex over (X)}(0,0)],

where {circumflex over (X)}^(rc) (0,0) is the rounding-compensated DCcoefficient, and DCOFFSET[ ] is an table mapping indices tocorresponding rounding offsets. Or, the DC coefficient is first mappedto an integer ranging from 0 to N_(rounding)−1, which is used as anindex to a table that maps indices to rounding offsets.

{circumflex over (X)} ^(rc)(0,0)={circumflex over(X)}(0,0)+DCOFFSET[f({circumflex over (X)}(0,0))],

where f(·) is an index-mapping function that maps a de-quantized DCcoefficient to an index in the range [0, N_(rounding)−1], and DCOFFSET[] has length N_(rounding).

In the sample domain, distortion is computed using original samplevalues and reconstructed sample values. Suppose x denotes a block ofsample values, and {circumflex over (x)} denotes a block ofreconstructed sample values obtained through inverse transform of{circumflex over (X)}:{circumflex over (x)}=T⁻¹({circumflex over (X)}).Distortion measured in the sample domain is based on x and {circumflexover (x)}. The distortion metric D can be defined in different ways. Forexample, D can be a sum of squared differences or errors (“SSE”):

${D = {\sum\limits_{{{for}\mspace{14mu} {all}\mspace{14mu} i},j}\left( {x_{i,j} - {\hat{x}}_{i,j}} \right)^{2}}},$

where x_(i) and {circumflex over (x)}_(i) are elements of x and{circumflex over (x)}, respectively, for all elements i and j. Or, D canbe a sum of absolute differences (“SAD”) or mean squared error (“MSE”)metric.

The cost of performing an inverse transform T⁻¹(·) on a large number ofpossible coding parameters or permutations for each block can be veryexpensive in the aggregate. Given the rounding-compensated de-quantizedcoefficient matrix {circumflex over (X)}^(rc) and original transformcoefficients X, however, sample-domain distortion can be estimatedwithout performing the inverse transform T⁻¹(·). Adapting the SSEmetric, one rounding-compensated sample-domain distortion metric is:

$D^{rc} = {\sum\limits_{{{for}\mspace{14mu} {all}\mspace{14mu} i},j}{\left( {{\hat{X}}_{i,j}^{rc} - X_{i,j}} \right)^{2}.}}$

Alternatively, an adapted SAD metric is used:

$D^{r\; c} = {\sum\limits_{{{for}\mspace{14mu} {all}\mspace{14mu} i},j}{{{{\hat{X}}_{i,j}^{rc} - X_{i,j}}}.}}$

Or, an adapted MSE metric is used:

$D^{rc} = {\frac{\sum\limits_{{{for}\mspace{14mu} {all}\mspace{14mu} i},j}{{{\hat{X}}_{i,j}^{rc} - X_{i,j}}}^{2}}{I \times J}.}$

In some implementations, the distortion metric is scaled by atransform-specific scaling factor α, for example, as follows:

$D^{rc} = {\alpha {\sum\limits_{{{for}\mspace{14mu} {all}\mspace{14mu} i},j}\left( {\hat{X_{i,j}^{rc}} - X_{i,j}} \right)^{2}}}$

The scaling factor accounts for differences between pixel-domain energyand transform-domain energy when the applied transform is non-unitary.When an encoder applies a frequency transform, the transform matrixvalues may cause a difference in energy in the transform domain andpixel domain. This is particularly true for integer transforms that arenot completely normalized. In general, the scaling factor α for atransform depends on the norms of the transform. For example, considerthe following transforms:

$T_{8} = {{\begin{bmatrix}12 & 12 & 12 & 12 & 12 & 12 & 12 & 12 \\16 & 15 & 9 & 4 & {- 4} & {- 9} & {- 15} & {- 16} \\16 & 6 & {- 6} & {- 16} & {- 16} & {- 6} & 6 & 16 \\15 & {- 4} & {- 16} & {- 9} & 9 & 16 & 4 & 16 \\12 & {- 12} & {- 12} & 12 & 12 & {- 12} & {- 12} & 12 \\9 & {- 16} & 4 & 15 & {- 15} & {- 4} & 16 & {- 9} \\6 & {- 16} & 16 & {- 6} & {- 6} & 16 & {- 16} & 6 \\4 & {- 9} & 15 & {- 16} & 16 & {- 15} & 9 & {- 4}\end{bmatrix}.T_{4}} = \begin{bmatrix}17 & 17 & 17 & 17 \\22 & 10 & {- 10} & {- 22} \\17 & {- 17} & {- 17} & 17 \\10 & {- 22} & 22 & {- 10}\end{bmatrix}}$

A VC-1 encoder can perform forward 4×4, 4×8, 8×4, and 8×8 transforms ona residual data block D_(i×j) (having i rows and j columns) as follows:

{circumflex over (D)} _(4×4)=(T ₄ ·D _(4×4) ·T′ ₄)∘N _(4×4) for a 4×4transform,

{circumflex over (D)} _(8×4)=(T ₈ ·D _(8×4) ·T′ ₄)∘N _(8×4) for a 8×4transform,

{circumflex over (D)} _(4×8)=(T ₄ ·D _(4×8) ·T′ ₈)∘N _(4×8) for a 4×8transform, and

{circumflex over (D)} _(8×8)=(T ₈ ·D _(8×8) ·T′ ₈)∘N _(8×8) for a 8×8transform,

where · indicates a matrix multiplication, ∘N_(i×j) indicates acomponent-wise multiplication by a normalization factor, T′ indicatesthe inverse of the matrix T, and {circumflex over (D)}_(i×j) representsthe transform coefficient block. The values of the normalization matrixN_(i×j) are given by:

N _(i×j) =c′ _(i) ·c _(j),

where:

${c_{4} = \begin{pmatrix}\frac{8}{289} & \frac{8}{292} & \frac{8}{289} & \frac{8}{292}\end{pmatrix}},{and}$ $c_{8} = {\begin{pmatrix}\frac{8}{288} & \frac{8}{289} & \frac{8}{292} & \frac{8}{289} & \frac{8}{288} & \frac{8}{289} & \frac{8}{292} & \frac{8}{289}\end{pmatrix}.}$

The forward transform causes an average expansion of a little more than34² in the transform coefficients, and the inverse transform causes anaverage expansion of a little more than 34² in the other direction.Collectively, the inverse transform includes right shifts by 10 forsimple normalization, which corresponds to division by 1024 withtruncation. So, the normalization in the inverse transform isessentially by (1/32)². The forward transform includes normalization byan average of roughly (1/36)². The average normalization in the forwardtransform (roughly (1/36)²) more than compensates for the averageexpansion (roughly 34²), so as to simplify normalization in the inversetransform for decoder implementations. As a result, consideringexpansion and normalization of the forward transform, the average effectis 34²/36²≈0.9. As such, an example scaling factor α betweensample-domain distortion and transform domain distortion for an adaptedSSE metric is 0.9²≈0.8

The scaling factor α for a different transform is similarly determinedconsidering expansion and normalization in the forward transform.

B. Distortion Estimation Tools with Rounding Compensation

FIG. 6 shows a generalized tool (600) for estimating sample-domaindistortion in the transform domain using rounding compensation. The tool(600) includes a rounding compensation module (640) that improves theaccuracy of pixel-based distortion estimated in the transform domain.

An encoder such as the encoder (500) of FIG. 5 incorporates the tool(600) in a controller or other module. Alternatively, another type ofencoder or system incorporates the distortion estimation tool (600).

In FIG. 6, x denotes a block or other set of sample values (605) in asample domain. For example, the sample values (605) are intensity valuesin the pixel domain for an image or video picture. For a video picture,the sample values of an intra-coded block are original intensity values,and the sample values of an inter-coded block are intensity values for aprediction residual. In encoders that use spatial prediction of samplevalues for intra-coded blocks, the intensity values of the block x canbe for a prediction residual following spatial prediction. In someimplementations, the intensity values are for the Y channel (i.e.,luminance). In other implementations, the intensity values are for the Uand V channels (i.e., chrominance) or for Y, U, and V channels atdifferent times.

The frequency transform module (610) transforms the sample values (605)into a block or other set of transform coefficients (615). The transformmodule (610) applies a DCT, DCT-like transform, or other block-basedtransform. The size of the transform is 8×8, 8×4, 4×8, 4×4 or some othersize.

The quantization module (620) quantizes the transform coefficients(615). Any of various types of quantization are applied. For example,the quantization module (620) applies uniform or non-uniformquantization, scalar or vector quantization, and/or adaptive ornon-adaptive quantization. The quantization module (620) produces ablock or other set of quantized coefficients.

The inverse quantization module (630) performs inverse quantization onthe quantized transform coefficients. The inverse quantization module(630) produces a block or other set of de-quantized transformcoefficients (635).

The rounding compensation module (640) adjusts one or more of thede-quantized transform coefficients (635). For example, the roundingcompensation module (640) adds a rounding offset to the DC coefficientof the de-quantized transform coefficients (635). Alternatively, therounding compensation module (640) adjusts one or more of thede-quantized transform coefficients (635) using some other mechanism.The rounding compensation module (640) produces the de-quantizedtransform coefficients (645), denoted with {circumflex over (X)}^(rc) inFIG. 6, which include the rounding-compensated coefficient(s).

The distortion estimation module (650) computes a distortion estimateusing the original transform coefficients (615) and the de-quantizedtransform coefficients (645) that include the rounding-compensatedcoefficient(s). The distortion metric is SSE, SAD, MSE, or some othermetric, and can be scaled or unscaled.

Particular embodiments of encoders or other tools typically use avariation or supplemented version of the generalized tool (600).Depending on implementation, modules of the tool can be added, omitted,split into multiple modules, combined with other modules, and/orreplaced with like modules. In alternative embodiments, tools withdifferent modules and/or other configurations of modules perform one ormore of the described techniques.

C. Estimating Distortion with Rounding Compensation

FIGS. 7 and 8 show techniques (700, 800) for estimating sample-domaindistortion from transform coefficients using rounding compensation. FIG.7 shows a generalized technique (700), and FIG. 8 shows a technique(800) performed during video encoding.

1. Generalized Technique

With reference to FIG. 7, a tool such as a video encoder, videotranscoder, image encoder, or image classification tool gets (710) ablock or other set of transform coefficients to evaluate. For example,the tool gets de-quantized transform coefficients following partialdecompression of the transform coefficients. Alternatively, the toolgets the transform coefficients in some other way.

The tool then performs (720) rounding compensation on one or more of thetransform coefficients. For example, the tool adjusts the DC coefficientby a rounding offset associated with the value of the DC coefficient. Insome implementations, the tool uses a rounding offset table that maps DCcoefficients (or indices derived from DC coefficients) to roundingoffsets. Alternatively, the tool performs rounding compensation forother and/or additional transform coefficients or uses another mechanismfor determining rounding offsets.

The tool estimates (730) sample-domain distortion using therounding-compensated transform coefficients. For example, the distortionmetric is scaled or unscaled SAD, MSE, or SSE. The tool determines (740)whether there are any other sets of transform coefficients to beevaluated before making a decision based upon the results of thedistortion estimate(s). If so, the tool continues with the nexttransform coefficients to evaluate. Otherwise, the tool makes (750) adecision based upon the results of the previous distortion estimate(s).

2. Distortion Estimation in Video Encoding

With reference to FIG. 8, a video encoder uses distortion estimationwith rounding compensation in making encoding decisions. The encoderuses this low-complexity approach to computing pixel-domain distortionaccurately to improve the choice of coding parameters at the encoderwithout significantly increasing encoder complexity.

The video encoder sets (810) one or more coding parameters for a blockto be encoded. Depending on implementation, at different times, thecoding parameters can include: (1) whether the block should be encodedas intra or inter; (2) a number of motion vectors; (3) value(s) ofmotion vector(s); (4) a type of a frequency transform; (5) a size of afrequency transform (e.g., 8×8, 4×8, 8×4, or 4×4); (6) a quantizationstep size; (7) quantization thresholds; (8) a dead zone size; and/or (9)perceptual quantization factors. The technique (800) can be applied toset parameters for an intra block or macroblock, inter block ormacroblock, or other unit of video. Alternatively, the coding parametersrelate to other and/or additional coding options.

The encoder encodes (820) the block according to the set codingparameter(s), performing a frequency transform and quantization, andgets (830) the transform coefficients to be analyzed. The encoderinverse quantizes (840) the transform coefficients, producingde-quantized transform coefficients. The encoder then performs roundingcompensation (850) on one or more of the de-quantized transformcoefficients, for example, adjusting the DC coefficient by a roundingoffset associated with the value of the DC coefficient.

The encoder estimates (860) pixel-domain distortion using therounding-compensated transform coefficient(s). For example, the encodercomputes a scaled SSE or other distortion metric that compares originaltransform coefficients with the rounding-compensated de-quantizedtransform coefficient(s) (from (850)) and other de-quantized transformcoefficients the encoder got.

The encoder determines (870) whether there are any other sets oftransform coefficients to be evaluated before making an encodingdecision based upon the distortion estimate(s). The encoder stops theevaluation process if constrained by time or resource requirements, ifthe encoder has evaluated a complete range of options, if the previoustransform coefficients provide acceptable results, or according to someother criteria.

If the encoder continues evaluation, the encoder changes (880) one ormore of the coding parameter(s) and encodes the block according to thecurrent coding parameter(s). In changing the coding parameter(s), theencoder can consider the results of previous distortion estimates so asto more accurately identify coding parameters likely to provide goodrate-distortion performance.

Otherwise, the encoder makes (890) an encoding decision based upon theprevious distortion estimate(s). For example, the encoder reviews one ormore of the previous distortion estimate(s), determines the lowestdistortion estimate, and adopts coding parameter(s) used to produce thetransform coefficients with the lowest distortion estimate.

Instead of evaluating distortion estimates for different codingparameters on a block-by-block basis, alternatively, an encoder canevaluate distortion estimates for different coding parameters on someother basis.

3. Distortion Estimation in Video Transcoding

As another example, a video transcoder estimates sample-domaindistortion from transform coefficients with rounding compensation intranscoding operations. A video transcoder converts encoded video in oneformat and bit rate to encoded video in another format and/or bit rate.In “homogeneous” transcoding, a transcoder converts encoded video from abit stream of a particular format at a first bit rate to a bit stream ofthe same format at a lower bit rate. In “heterogeneous” transcoding, atranscoder converts encoded video to a different format.

In one example of homogeneous transcoding, a transcoder uses distortionestimation with rounding compensation when selecting a quantization stepsize for a block in transcoded video. The transcoder applies differentquantization step sizes and estimates pixel-domain distortion relativeto the transform coefficients of the block of the original, encodedvideo. For a given quantization step size, after inverse quantizationthe transcoder gets a block of de-quantized transform coefficients andperforms rounding compensation on one or more of them. The transcodercomputes a distortion metric that compares transform coefficients fromthe first encoded video with the rounding-compensated transformcoefficient(s) and other de-quantized transform coefficients thetranscoder got for the evaluated quantization step size. The transcoderreviews the distortion estimates and stops the evaluation process if itfinds a quantization step size for the block that gives goodrate-distortion performance for encoding at the lower bit rate. If thetool continues, the tool gets a block of transform coefficients fordifferent quantization step size.

For heterogeneous transcoding, a transcoder can use roundingcompensation as in the homogeneous transcoding case, also using ascaling factor to account for energy differences between transformcoefficients in the first format and transform coefficients in thesecond format.

4. Distortion Estimation in Image Classification

As another example, an image classification tool estimates sample-domaindistortion from transform coefficients with rounding compensation inanalysis and classification operations. For example, the tool determineshow closely a first compressed image matches a second compressed image.To do this without fully decompressing each image, the tool comparestransform coefficients from the first image to corresponding transformcoefficients from the second image. By using rounding compensation, thetool accounts for rounding that would occur in an inverse frequencytransform, thereby more accurately estimating pixel-domain distortion(here, differences between the two images). After comparing transformcoefficients for blocks or other sections of the images, the imageclassification tool reviews distortion estimate(s) and makes aclassification decision on how closely two images match. The imageclassification tool then outputs the classification decision to a fileor screen.

Or, the image classification tool determines how closely an imagematches an image signature or image pattern, comparingrounding-compensated transform coefficients of the image to the imagesignature or image pattern.

D. Computing Rounding Offset Tables

FIG. 9 shows a technique (900) for computing a rounding offset table. Aperson, team, or other entity developing a tool such as the encodershown in FIG. 5 performs the technique (900). Typically, the technique(900) is performed off-line, and the table is used thereafter duringencoding or other operations.

To start, a range of de-quantized transform coefficient values isidentified (910). In an example implementation, the range of samples is[0 . . . 255] and, following an 8×8 forward frequency transform,quantization, and inverse quantization, the range of de-quantized DCcoefficients is [−1816 . . . 1816]. More specifically, {circumflex over(X)}^(rc)(0,0) can take the value of any even integer in the range of[−1816 . . . 1816]—odd integers are not possible due to the rulesapplied in inverse quantization. In different implementations, the rangeof values for de-quantized transform coefficients is different or thetransform size is different. Different ranges of input sample valuestypically result in different ranges of de-quantized transformcoefficient values. Depending on size (e.g., 4×4, 4×4, or 8×4) or type,different transforms often have different scaling factors, which alsoresults in different ranges of de-quantized transform coefficientvalues. Finally, different quantization and inverse quantization rulesaffect which de-quantized transform coefficient values are possiblewithin a range. Although only even values are possible in theimmediately preceding example implementation, the approach can beapplied to non-integer quantization parameters (“QPs”) in which QPs canhave ½-step increments. In this case, reconstructed DC coefficients canhave odd values.

Next, rounding offsets for the de-quantized transform coefficient valuesin the range are computed (920). In the example implementation, therounding offsets for DC coefficient values are computed and stored in atable labeled DCOFFSET[ ]. For each de-quantized coefficient value k inthe identified range (here, even numbers in the range of [−816 . . .1816]), an inverse frequency transform is performed on a block havingthat value as its DC coefficient and zeros for all AC coefficients. Aforward frequency transform is performed on the result. The differenceis then computed between the DC coefficient of the forward transformresults and the initial de-quantized coefficient value k. (To compensatefor scaling in the transforms in some implementations, k is multipliedby a scaling factor.) The following equation yields rounding offsetvalues for quantized coefficient values from −1816 to 1816, which can bestored in an array DCOFFSET.

${{{DCOFSET}\left\lbrack \frac{k + 1816}{2} \right\rbrack} = {\left( {D\; C\mspace{14mu} {coefficient}\mspace{14mu} {of}\mspace{14mu} {T\left( {T^{- 1}\left( {\hat{X}}_{k} \right)} \right)}} \right) - \left( {k \times 16} \right)}},$

where {circumflex over (X)}_(k) is a block of de-quantized transformcoefficients having a value of k for the DC coefficient and a value ofzero for each AC coefficient. The following table shows example valuesof k and corresponding rounding offsets.

TABLE 1 DC Rounding Offsets in Example Implementation for Scaling Factor= 16 DC coeff. of rounding k T (T⁻¹({circumflex over (X)}_(k))) k × 16offset −1816 −29014 −29056 42 −1814 −29014 −29024 10 −1812 −29014 −28992−22 −1810 −29014 −28960 −54 −1808 −28900 −28928 28 −1806 −28900 −28896−4 −1804 −28900 −28864 −36 −1802 −28786 −28832 46 −1800 −28786 −28800 14−1798 −28786 −28768 −18 −1796 −28786 −28736 −50 −1794 −28672 −28704 32−1792 −28672 −28672 0 −1790 −28672 −28640 −32 −1788 −28559 −28608 49−1786 −28559 −28576 17 −1784 −28559 −28544 −15 −1782 −28559 −28512 −47−1780 −28445 −28480 35 −1778 −28445 −28448 3 −1776 −28445 −28416 −29−1774 −28331 −28384 53 −1772 −28331 −28352 21 −1770 −28331 −28320 −11−1768 −28331 −28288 −43 −1766 −28217 −28256 39 −1764 −28217 −28224 7−1762 −28217 −28192 −25 −1760 −28104 −28160 56 −1758 −28104 −28128 24−1756 −28104 −28096 −8 −1754 −28104 −28064 −40 −1752 −27990 −28032 42−1750 −27990 −28000 10 −1748 −27990 −27968 −22 −1746 −27990 −27936 −54−1744 −27876 −27904 28 . . . . . . . . . . . .   1808 28899 28928 −29  1810 29013 28960 53   1812 29013 28992 21   1814 29013 29024 −11  1816 29013 29056 −43

In other implementations, the scaling factor of 16 is not needed sincenormalization is completed in the inverse and forward transforms. Thescaling factor of 16 is not incorporated in the computation of roundingoffsets, and the following equation yields rounding offset values forquantized coefficient values from −1816 to 1816, which can be stored inan array DCOFFSET.

${{DCOFFSET}\left\lbrack \frac{k + 1816}{2} \right\rbrack} = {\left( {D\; C\mspace{14mu} {coefficient}\mspace{14mu} {of}\mspace{14mu} {T\left( {T^{- 1}\left( {\hat{X}}_{k} \right)} \right)}} \right) - {k.}}$

Thus, for example, if the DC coefficient is −1816, −1814, −1812, or−1810, the inverse transform produces a residual block of with values−255, after rounding, and applying the forward transform to such blocksproduces DC coefficients of −1813. The following table shows examplevalues of k and corresponding rounding offsets.

TABLE 2 DC Rounding Offsets in Example Implementation with No ScalingFactor DC coeff. of rounding k T (T⁻¹({circumflex over (X)}_(k))) k × 1offset −1816 −1813 −1816 3 −1814 −1813 −1814 1 −1812 −1813 −1812 −1−1810 −1813 −1810 −3 −1808 −1806 −1808 2 −1806 −1806 −1806 0 −1804 −1806−1804 −2 −1802 −1799 −1802 3 −1800 −1799 −1800 1 −1798 −1799 −1798 −1−1796 −1799 −1796 −3 −1794 −1792 −1794 2 −1792 −1792 −1792 0 −1790 −1792−1790 −2 −1788 −1785 −1788 3 −1786 −1785 −1786 1 −1784 −1785 −1784 −1−1782 −1785 −1782 −3 −1780 −1778 −1780 2 −1778 −1778 −1778 0 −1776 −1778−1776 −2 −1774 −1771 −1774 3 −1772 −1771 −1772 1 −1770 −1771 −1770 −1−1768 −1771 −1768 −3 −1766 −1764 −1766 2 −1764 −1764 −1764 0 −1762 −1764−1762 −2 −1760 −1756 −1760 4 −1758 −1756 −1758 2 −1756 −1756 −1756 0−1754 −1756 −1754 −2 −1752 −1749 −1752 3 −1750 −1749 −1750 1 −1748 −1749−1748 −1 −1746 −1749 −1746 −3 −1744 −1742 −1744 2 . . . . . . . . . . ..   1808 1806 1808 −2   1810 1813 1810 3   1812 1813 1812 1   1814 18131814 −1   1816 1813 1816 −3

Alternatively, another mechanism is used to compute rounding offsets forthe de-quantized transform coefficient values in the identified range.

Different ranges and transforms/inverse transforms result in differentrounding offsets. For example, if odd values of reconstructed DCcoefficients are possible (due to non-integer QPs or otherwise), theoffsets of the DCOFFSET[ ] table include offsets for odd DC values.

Table 2 shows example values of k and corresponding rounding offsets fora generalized and idealized example in which the range of de-quantizedtransform coefficient values is any integer and the scaling factor forthe transform/inverse transform is 1.

TABLE 3 DC Rounding Offsets in Generalized Example DC coeff. of roundingk T (T⁻¹({circumflex over (X)}_(k))) k × 1 offset . . . . . . . . . . .. −3   −5 −3 −2 −2   0 −2 2 −1   0 −1 1 0 0 0 0 1 0 1 −1 2 0 2 −2 3 5 32 4 5 4 1 5 5 5 0 6 5 6 −1 7 5 7 −2 8 10 8 2 9 10 9 1 . . . . . . . . .. . .

A rounding offset table is then created (940). For example, the roundingoffset table is an array that maps de-quantized transform coefficientvalues to corresponding rounding offset values. A tool looks up arounding offset by direct indexing with the value of a de-quantizedtransform coefficient. Alternatively, the rounding offset table isrepresented with a different data structure.

Optionally, a periodic pattern in the rounding offset values isidentified (930) before the rounding offset table is created. Forexample, a person performing the technique (900) identifies a periodicpattern visually, by plotting rounding offsets versus de-quantizedtransform coefficient values, or using analytical software. In general,periodicity in the DC rounding offsets allows for reduction in tablesize. In the generalized, idealized example of Table 3, the DC roundingoffset is periodic with a period of 5. In the example implementation ofTable 1 or Table 2, the DC rounding offset is periodic with a period of32. The array DCOFFSET[ ] stores the first 32 rounding offset values,and the length N_(rounding) is 32. Specifically, the values in the DCrounding offset table are DCOFFSET[32]={42, 10, −22, −54, 28, −4, −36,46, 14, −18, −50, 32, 0, −32, 49, 17, −15, −47, 35, 3, −29, 53, 21, −11,−43, 39, 7, −25, 56, 24, −8, −40}. To look up the correct DC roundingoffset for a de-quantized DC coefficient, the de-quantized DCcoefficient value is converted to a table index by an index mappingfunction ƒ(·) as follows: ƒ(i)=((i+1816)>>1) & 31.

If the scaling factor of 16 is not incorporated in the computation ofrounding offsets (see Table 2), the values in the table areDCOFFSET[32]={3, 1, −1, −3, 2, 0, −2, 3, 1, −1, −3, 2, 0, −2, 3, 1, −1,−3, 2, 0, −2, 3, 1, −1, −3, 2, 0, −2, 4, 2, 0, −2}.

The index-mapping function is different for different ranges ofde-quantized transform coefficient values and different periodicpatterns. The index-mapping function can use different operations, forexample, computing “coefficient value” MOD “period.” Different ranges ofde-quantized transform coefficient values and different forward/inversetransforms result in different periodic patterns.

IV. Extensions

Although the techniques and tools described herein are in placespresented in the context of video encoding, sample-domain distortionestimation from transform coefficients with rounding compensation may beapplied to other data compression schemes in which an integer-basedtransform (especially a DCT-like transform) is used. For example, thetechniques and tools may be applied when encoding images with aninteger-based, DCT-like transform. In addition, as noted above, thetechniques and tools may be applied in transcoding applications andimage classification and analysis applications.

Finally, an encoder can use results of rounding compensation inoperations other than distortion estimation. For example, an encodergroups quantized DC coefficient values using rounding offsetinformation. The quantized DC coefficient values within a group haverounding offsets that make the values within the group equivalent forpurposes of estimating distortion. In terms of the example of Table 1,the values −1816, −1814, −1812, and −1810 are grouped, since T(T¹(·)) ofeach of these values results in the same value (−29014). The values−1808, −1806, and −1804 are in a second group, and so on. The encodercan perform the grouping off-line. During entropy coding, if a first DCcoefficient value in a group can be represented with fewer bits than asecond DC coefficient value in the same group, the encoder uses thefirst DC coefficient value instead of the second DC coefficient value inthe encoded data.

Having described and illustrated the principles of our invention withreference to various embodiments, it will be recognized that the variousembodiments can be modified in arrangement and detail without departingfrom such principles. It should be understood that the programs,processes, or methods described herein are not related or limited to anyparticular type of computing environment, unless indicated otherwise.Various types of general purpose or specialized computing environmentsmay be used with or perform operations in accordance with the teachingsdescribed herein. Elements of embodiments shown in software may beimplemented in hardware and vice versa.

In view of the many possible embodiments to which the principles of ourinvention may be applied, we claim as our invention all such embodimentsas may come within the scope and spirit of the following claims andequivalents thereto.

1.-12. (canceled)
 13. An encoder comprising: a frequency transformer forapplying a frequency transform to convert samples in a sample domaininto transform coefficients in a transform domain; a quantizer forquantizing the transform coefficients; an entropy encoder for entropyencoding the quantized transform coefficients; an inverse quantizer forde-quantizing the quantized transform coefficients; and a controller formaking encoding decisions after considering post-inverse frequencytransform rounding effects on de-quantized transform coefficient values.14. The encoder of claim 13 wherein the controller includes a module forestimating pixel-domain distortion between original transformcoefficients of a block and corresponding de-quantized transformcoefficients of the block by: compensating for rounding in at least oneof the corresponding de-quantized transform coefficients of the block;and computing the estimated pixel-domain distortion using the originaltransform coefficients of the block, the at least onerounding-compensated coefficient and other coefficients of thede-quantized transform coefficients of the block.
 15. The encoder ofclaim 14 further comprising a table for use in the compensating by, foreach of the at least one rounding-compensated coefficient: determiningan index from the coefficient; looking up the index in the table todetermine a corresponding offset; and adjusting the coefficient by thecorresponding offset.
 16. The encoder of claim 14 wherein the encodingdecisions are based at least in part on the estimated pixel-domaindistortion.
 17. The encoder of claim 13 wherein the encoding decisionscomprise setting coding parameters for two or more of intra/interencoding, quantization dead zone size, transform size, number of motionvectors, motion vector value, perceptual quantization, quantizationthresholding, and quantization step size.
 18. The encoder of claim 13wherein the rounding effects indicate multiple values in a group of thede-quantized transform coefficient values affect distortionequivalently, and wherein the encoding decisions comprise selectingbetween the multiple values based on bits required for the multiplevalues respectively.
 19. A method comprising: identifying a range ofvalues for a de-quantized transform coefficient; computing a roundingoffset for each of plural values in the range; identifying a periodicpattern in the computed rounding offsets; mapping representative valuesto corresponding rounding offsets in an offset table, wherein thecorresponding rounding offsets show at least one period of the patternwithout the offset table including all values in the range; and storingthe offset table.
 20. The method of claim 19 further comprisingdistributing the offset table with a video encoder.
 21. The method ofclaim 19, wherein the offset table maps representative de-quantizedvalues to representative offsets for less than all of the de-quantizedvalues.
 22. The method of claim 19 further including: determining anindex from a first coefficient; determining an offset by looking up theindex in the offset table; and adjusting the first coefficient by theoffset.